Adaptive bimodal sensor fusion for automatic speechreading
نویسندگان
چکیده
We present recent work on improving the performance of automated speech recognizers by using additional visual in formation Lip Speechreading achieving error reduction of up to This paper focuses on di erent methods of combining the visual and acoustic data to improve the recognition performance We show this on an extension of an existing state of the art speech recognition system a modular MS TDNN We have developed adaptive combi nation methods at several levels of the recognition network Additional information such as estimated signal to noise ra tio SNR is used in some cases The results of the di er ent combination methods are shown for clean speech and data with arti cial noise white music motor The new combination methods adapt automatically to varying noise conditions making hand tuned parameters unnecessary
منابع مشابه
A New Approach to Self-Localization for Mobile Robots Using Sensor Data Fusion
This paper proposes a new approach for calibration of dead reckoning process. Using the well-known UMBmark (University of Michigan Benchmark) is not sufficient for a desirable calibration of dead reckoning. Besides, existing calibration methods usually require explicit measurement of actual motion of the robot. Some recent methods use the smart encoder trailer or long range finder sensors such ...
متن کاملLarge-vocabulary audio-visual speech recognition by machines and humans
We compare automatic recognition with human perception of audio-visual speech, in the large-vocabulary, continuous speech recognition (LVCSR) domain. Specifically, we study the benefit of the visual modality for both machines and humans, when combined with audio degraded by speech-babble noise at various signal-to-noise ratios (SNRs). We first consider an automatic speechreading system with a p...
متن کاملBilingual corpus for AVASR using multiple sensors and depth information
In this paper we present the Bilingual Audio-Visual Corpus with Depth information (BAVCD). The database contains utterances of connected digits, spoken by 15 subjects in English and 6 subjects in Greek, and collected employing multiple audio-visual sensors. Among them, of particular interest is the use of the Microsoft Kinect device, which is able to capture facial depth images using the struct...
متن کاملDesigning a Home Security System using Sensor Data Fusion with DST and DSMT Methods
Today due to the importance and necessity of implementing security systems in homes and other buildings, systems with higher certainty, lower cost and with sensor fusion methods are more attractive, as an applicable and high performance methods for the researchers. In this paper, the application of Dempster-Shafer evidential theory and also the newer, more general one Dezert-Smarandache theory ...
متن کاملExploiting lower face symmetry in appearance-based automatic speechreading
Appearance-based visual speech feature extraction is being widely used in the automatic speechreading and audio-visual speech recognition literature. In its most common application, the discrete cosine transform (DCT) is utilized to compress the image of the speaker’s mouth region-of-interest (ROI), and the highest energy spatial frequency components are retained as visual features. Good genera...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 1996